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"Memory foam" approach to unsupervised learning 

Natalia B. Jansorij and Christopher J. Marsden 
School of Mathematics, Loughborough University, Loughborough LEll 3TU, UK 

We propose an alternative approach to construct an artificial learning system, which naturally 
learns in an unsupervised manner. Its mathematical prototype is a dynamical system, which au- 
tomatically shapes its vector field in response to the input signal. The vector field converges to 
a gradient of a multi-dimensional probability density distribution of the input process, taken with 
negative sign. The most probable patterns are represented by the stable fixed points, whose basins 
of attraction are formed automatically. The performance of this system is illustrated with musical 
signals. 

PACS numbers: 05.45.-a,05.40.-a,07.05.Mh,87.19.1v 



The tasks being posed to, and solved by, the modern 
artificial "intelligent" (AI) devices are broad and include 
image and speech recognition, machine vision, language 
processing and medical diagnostics, to mention just a few 
[l|. However, in spite of the word "intelligence" behind 
the AI abbreviation, in essence, these machines are only 
able to perform two tasks: classification and optimiza- 
tion, which include decision-making. Learning has been 
understood merely as acquiring the ability to perform 
these tasks. 

The performance of modern AI devices is based on al- 
gorithms, i.e. while fulfilling their goal they perform a 
sequence of pre-defined commands. Even the later gen- 
eration of AI devices, that are based on neural networks, 
employ algorithms at least at the stage of learning [2]. 
Contrary to that, it seems that a biological brain does 
not naturally execute a sequence of commands, although 
it can be trained to do so (often with some effort, e.g. 
when solving routine mathematical problems). In partic- 
ular, the brain does not seem to learn by an algorithm. 

As can be expected from algorithm-based devices, the 
natural way of learning generally requires a teacher - i.e. 
a truly intelligent system - and can be fully supervised, 
semi-supervised [3] or reinforcement [4]. The unsuper- 
vised learning defined within the AI field, is acquiring 
the ability to attribute a new entry to a certain class 
without any help from a teacher ^]. 

In this Letter we propose an alternative approach to 
describe a learning process. Namely, we suggest that a 
thinking system should work as a machine, that adjusts 
its architecture in response both to sensory input, and 
to the processes inside itself in an analogue (i.e. non- 
algorithmic) way. We introduce a mathematical proto- 
type of this machine - a dynamical system, that shapes 
its vector field in response to the external stimulus - i.e. 
we describe the first component of the thinking process. 
The model does not rely on any biological knowledge. 

Let every (scalar or vector) value of the input at the 
given time moment represent a certain pattern, that can 
be of any origin: visual, auditory, tactile, olfactory, or 
their combination. It could be the color of the image, 
the pitch of the sound, etc. The implementation of a 



non-algorithmic classification (pattern recognition) was 
proposed in [6] by means of neural networks (NNs) - 
a collection of units, each with fixed architecture, which 
are flexibly coupled to each other. However, learning in a 
NN is algorithmic and consists in adjusting the strengths 
of couplings ( "weights" ) in response to a training set of 
patterns. As a result, an energy profile is formed in the 
phase space of the NN [7|, whose minima (attracting fixed 
points) represent the centres of classes, and the respec- 
tive basins of attraction represent classes. When learning 
is over, the weights are fixed, the new input patterns are 
given by initial conditions, and classification occurs non- 
algorithmically as the NN evolves towards the nearest 
attractor [2[ . A series of technical problems can occur as 
a NN learns, including the formation of spurious attr ac- 
tors. Also, the most natural way of learning for a NN is 
supervised, while semi- or unsupervised learning require 
considerable complication of the algorithms. 

Here, we propose the construction of a dynamical sys- 
tem, whose vector field is the gradient of the potential 
energy, which is shaped by the external stimulus non- 
algorithmically and without supervision. If the stimulus 
comes from a stationary and ergodic random process, this 
"energy" represents a negative multi-dimensional proba- 
bility density distribution of the input, and each stable 
fixed point represents the most probable pattern from 
the input class. The system recognizes the new patterns 
just like a particle that is placed into a potential energy 
profile V{x), which moves towards the nearest minimum, 
possibly being affected by noise, according to ^] 
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where x represents the location in A/'-dimensional space, 
and ^(t) is noise. 

ModeL It is based on a loose analogy with the "mem- 
ory foam" , used in orthopedic mattresses, that takes the 
shape of the body pressed against it, but slowly returns 
to its original shape after the pressure is removed. As- 
sume that initially we have a one- dimensional "foam" 
stretched in x direction, and that initially it is flat, i.e. 




U(x,t) 



FIG. 1: (Color online.) Illustration of the idea of memory 
foam. 



its profile is U{x)={) (Fig. (TJ t=0). If a stone drops onto 
the foam at position x=77, the foam profile is deformed: 
a dent appears, which is the deepest exactly at x=7^, and 
gets shallower at larger distances from r] (Fig. (TJ t=l). 
Also, assume that the foam is elastic with elasticity fac- 
tor /c, that models the capacity to forget. The deeper 
the dent at the position x is, the faster the foam tries 
to come back to [7=0 (to forget). In other words, the 
foam will learn about the stone and its position. Now 
assume that we subject the foam to an external stimulus 
7^(t), as if at any new time moment t a new stone drops 
at a new position x=r]{t) (Fig. (U t=2), thus shaping the 
"foam" continuously. The signal r]{t) can be of either de- 
terministic, or stochastic nature, and can have arbitrary 
statistical properties. Next we derive an equation, that 
describes the evolution of the foam profile U{x^t) under 
the influence of r]{t). 

Consider how the foam profile changes over a small, 
but finite time interval At: 

U{x, t^At) = U{x, t) - g{x - r])At - kU{x, t) At, (2) 

where g{z) is some non- negative bell-shaped function, de- 
scribing the shape of a single dent, e.g. a Gaussian func- 
tion, g(^z) = —7^=^ex.p{ — ^). The natural initial condi- 
tions would be U{x, 0)=0; however, as will be shown be- 
low, the limiting shape of the foam does not depend on 
the initial conditions if r]{t) is ergodic and k=0. 

In ([2]) move U{x,t) to the left-hand side, divide both 
parts of by At, and take the limit as At —^ 0, to obtain 



dU{x,t) 

m 



-g{x — T]) — kU{x^ t). 



(3) 



It can be shown by numerical simulation with some arbi- 
trary 7^(t), that the solution /7(x, t) has a linear trend, i.e. 
it behaves as a linearly decaying function of t with su- 
perimposed fluctuations. We wish to eliminate this trend 
and see if we can achieve some sort of stationary behavior 
of U{x, t). Perform the change of variables 

^^ U dV 1 fdU , \ dU dV ,, 




FIG. 2: (Color online) Evolution of the "memory foam" 
y (x, t) as the random stimulus is applied by numerically sim- 
ulating Eq. (|4|: (a,c) 3D view; (b,d) projection of V{x^ t) onto 
(x,t) plane shown by color (shade of grey), and the stimulus 
applied - by filled circles. In (a,c) the probability density dis- 
tribution of stimulus is given by solid line at the front. In 
(a,b) the consecutive values of the stimulus are uncorrelated, 
and in (c,d) - correlated [9|]. 



and rewrite (J3j) as follows 



^ = --[V^g{x-,) 



kV. 



(4) 



Evolution of the foam profile V{x^ t) is illustrated in Fig. 
[21 (a) in 3D, and (b) in its projection on the (x, t) plane, 
as the signal shown by filled circles in (b) is applied at 
each consecutive time moment t. Eq. (|4]) has the same 
form if the stimulus 77 is a vector of dimension N] then x 
is a vector, and V and g are functions of N variables. 

Proof of shaping into the input density. Consider 
the evolution of V(x,t), where the A/'-dimensional input 
vector 77 (t) is a realization of a strict-sense stationary and 
ergodic random process H(t) with some arbitrary prob- 
ability density distribution (FDD) ^^(t^i, 7^27 • • • , ^at)- 
Due to stationarity, p^ does not change in time; due 
to ergodicity^ any single realization 77 (t) contains all in- 
formation about p^, i.e. any statistical characteristic 
can be obtained from 7^(t) by averaging over time, rather 
than over the ensemble of realizations that would have 
been required for a non-ergodic process [10]. Below we 
will show that with time, V takes the shape of p^. 

Assume that /c = 0, i.e. that the foam does not forget 
what it learnt. Multiply both parts of Eq. (|4j) by dt and 
integrate. A stationary behavior of V implies 
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0, and therefore 
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dt = 0. 



(5) 



Consider the integral of the r.h.s. of Eq. ^ and its limit 



as t 



oo 



V(x,t) 



lim 



{V + g{x - ri))dt 



(6) 



representing the (negative) time average {V -\- g{x — r])) 
of the expression under the integral. The term g{x — H) 
is a non-Hnear smooth function of an ergodic process H. 
As proved in [11], "zero- memory nonhnear operations on 
ergodic processes are ergodic" - therefore, g{x — H) is 
also an ergodic random process. Thus we can replace 
time average (|6]) by statistical average. 



Vp%{r])dr]^ / 9{x-r])p%{r])dr]. 
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(7) 
In the above, the integral with respect to r] represents, 
for brevity, N integrals with respect to the components 
^i5 • • • ,Vn of vector rj. Since V does not depend on 77 
explicitly, the first term in the right-hand side of (O is 
equal to V. The second term is the convolution of p^{r]) 
with the function g{r]). If g{x — r]) = 5{x — rf)^ where 5{z) 
is Dirac delta- function of several variables, this term is 
equal to minus p^(x), due to the sifting property of delta- 
function [12| . From (j4]) combined with (J5j) it follows that 
the expression ([7]) is equal to 0. We therefore proved 
that as time t goes to infinity, V{x^t) tends to —p%{x)^ 
provided that g{z) tends to Dirac delta- function. 

In Fig. [2] the evolution of V{x^t) is illustrated, as two 
kinds of scalar stimuli are applied to the one-dimensional 
foam. Their PDDs are of similar two-peak shape (see 
solid lines at the front in (a,c)), but two consecutive val- 
ues are non-correlated in (a,b), and correlated in (c,d) 
[9[. The actual signals applied are shown by filled cir- 
cles in (b,d), and in g{z) we used <J2=a/oT. One can 
see that eventually both foams shape into the respective 
PDDs, but if the stimulus values are uncorrelated, the 
convergence is faster. 

This shaping mechanism reminds one of kernel density 
estimation used in statistics [13,] , but is dynamical as op- 
posed to algorithmic, and has no restriction of indepen- 
dent inputs to the system. If H(t) is not stationary, the 
foam evolves into a time- averaged density of the input. 

Application to musical data. Next, we illustrate how 
the proposed foam discovers and memorises musical notes 
and phrases. A children's song "Mary had a little lamb" 
was performed with a fiute by an amateur musician six 
times. The song involves three musical notes (A, B and 
G), consists of 32 beats and was chosen for its simplicity 
to illustrate the principle. The signal was recorded as 
a wave-file with sampling rate 8kIIz. In agreement with 
what is usually done in speech recognition [14], the short- 
time Fourier Transform was applied [15] to the waveform 
with a sliding window of duration r=0.75 sec, which was 
roughly the duration of each note. The highest spectral 
peak was extracted for each window, which corresponded 
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FIG. 3: (Color online.) Flute - musical note recognition. 
Notations are as in Fig. O 



to the main frequency / Hz of the given note. A sequence 
of frequencies fit) was used to stimulate the foam. Note, 
that each value of f(t) was slightly different from the 
exact frequency of the respective note, because of the 
natural variability introduced by a human musician, and 
the signal f{t) was in fact random, as seen from Fig. 

Hb). 

First, we illustrate how individual musical notes can 
be automatically identified. A one-dimensional foam re- 
ceived the signal r]{t)=f{t)^ resampled to 8Hz to save 
computation time. Function f{t) can be seen as a real- 
ization of a Ist-order stationary and ergodic process F{t), 
consisting of infinitely many repetitions of the same song, 
which we observe during finite time. This process has a 
one-dimensional FDD pf (/), which does not change in 
time. Gaussian kernel g{z) was used with az=V^ Hz. 
As shown in Fig. [H^a), the foam converges to some FDD 
shown by solid line. It automatically discovers the most 
probable frequencies as follows, figures in brackets show- 
ing the exact frequencies of the respective musical notes: 
434Hz (440Hz) for A4, 490Hz (493.88Hz) for ^4, and 
388Hz (392Hz) for G4. 

Second, we show how the foam can discover and mem- 
orize temporal patterns - musical phrases consisting of 
four beats. The 4D foam was used, and to each of its 
channels the same signal f{t) was applied, but with a 
phase shift. Namely, at each time t the foam received a 
vector stimulus V;(t) = (/(t),/(t + r),/(t + 2r),/(t + 3r)), 
r=0.75 sec. For the purpose of this part, we can regard 
ilj{t) as a realization of a 4th-order stationary and ergodic 
vector random process ^(t) (which we observe during fi- 
nite time) with 4D FDD pf (/i, /2, /s, /4). We used a 
multivariate Gaussian kernel g with az=V^ Hz in all of 
its four variables. 
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FIG. 4: (Color online.) Musical phrase recognition. Descrip- 
tion is in text. 



One cannot visualize evolution of a 4D foam in the 
same way as we did in Figs. [2][3l and we use an alter- 
native representation. We take four half-axes and make 
their origins coincide (Fig. HJ^a)). For each feasible input 
'^=(/i, /2, /s, f^) we put 4 points with coordinates fi on 
each of half- axes, and connect them by lines. Thus, any 
feasible input pattern is represented by a polygon on a 
plane. (This can be done for any dimension of input vec- 
tor.) The value of pf at each point can be represented 
by the color of the respective polygon (Fig. ID^b)). The 
polygon, whose color is the darkest, is the most probable 
pattern. Unfortunately, when too many polygons over- 
lap, it might be difficult to see the darkest ones. But 
they can be found using a particle in the 4D foam, that 
will go to the most probable pattern: five such patterns 
are given in smaller scale in Fig. Hl^c). 

Recognition of musical phrases is also illustrated by 
the supplemented wave-files [16|]. 

Discussion. The memory foam approach presented here 
might pave the way to create a new generation of informa- 
tion processing machines. Unlike both digital computers 
and neural networks, these devices will be fully analogue 
and in this sense closer to biological brains. The pro- 
posed approach assumes naturally unsupervised learning, 
which is traditionally more challenging than other types 
of learning; however, supervision can be implemented at 
any stage, if required. Also, the "memory foam" can 
combine learning with pattern recognition, i.e. function 
in the "on-line learning" regime. The importance of be- 
ing able to create hierarchies of patterns in AI devices 
cannot be overestimated (see e.g. [17]). With a musical 
example we demonstrated how hierarchies of patterns can 
be created in a dynamical way, by going from single notes 
to their combinations. 

A famous major problem, arising in connection with AI 
performance, is the so-called "curse of dimensionality" . 
As the problem becomes more complicated, the number 
of states of a traditional AI device grows very quickly. 



and becomes too large for the computer memory, or the 
connectivity of artificial NNs. The "curse" can be worked 
around [18], but there is always a price (e.g. the duration 
of calculations). The "memory foam" device would not 
require connectivity similar to that in NNs, and might 
provide a solution to the "curse" problem. 
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